The use of needles to access sites within organs is fundamental to many interventional medical procedures both for diagnosis and treatment. Safe and accurate navigation of a needle through living tissue to an intra-tissue target is currently often challenging or infeasible due to the presence of anatomical obstacles in the tissue, high levels of uncertainty, and natural tissue motion (e.g., due to breathing). Medical robots capable of automating needle-based procedures in vivo have the potential to overcome these challenges and enable an enhanced level of patient care and safety. In this paper, we show the first medical robot that autonomously navigates a needle inside living tissue around anatomical obstacles to an intra-tissue target. Our system leverages an aiming device and a laser-patterned highly flexible steerable needle, a type of needle capable of maneuvering along curvilinear trajectories to avoid obstacles. The autonomous robot accounts for anatomical obstacles and uncertainty in living tissue/needle interaction with replanning and control and accounts for respiratory motion by defining safe insertion time windows during the breathing cycle. We apply the system to lung biopsy, which is critical in the diagnosis of lung cancer, the leading cause of cancer-related death in the United States. We demonstrate successful performance of our system in multiple in vivo porcine studies and also demonstrate that our approach leveraging autonomous needle steering outperforms a standard manual clinical technique for lung nodule access.
translated by 谷歌翻译
We identify the task of measuring data to quantitatively characterize the composition of machine learning data and datasets. Similar to an object's height, width, and volume, data measurements quantify different attributes of data along common dimensions that support comparison. Several lines of research have proposed what we refer to as measurements, with differing terminology; we bring some of this work together, particularly in fields of computer vision and language, and build from it to motivate measuring data as a critical component of responsible AI development. Measuring data aids in systematically building and analyzing machine learning (ML) data towards specific goals and gaining better control of what modern ML systems will learn. We conclude with a discussion of the many avenues of future work, the limitations of data measurements, and how to leverage these measurement approaches in research and practice.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Recent research has demonstrated the capability of behavior signals captured by smartphones and wearables for longitudinal behavior modeling. However, there is a lack of a comprehensive public dataset that serves as an open testbed for fair comparison among algorithms. Moreover, prior studies mainly evaluate algorithms using data from a single population within a short period, without measuring the cross-dataset generalizability of these algorithms. We present the first multi-year passive sensing datasets, containing over 700 user-years and 497 unique users' data collected from mobile and wearable sensors, together with a wide range of well-being metrics. Our datasets can support multiple cross-dataset evaluations of behavior modeling algorithms' generalizability across different users and years. As a starting point, we provide the benchmark results of 18 algorithms on the task of depression detection. Our results indicate that both prior depression detection algorithms and domain generalization techniques show potential but need further research to achieve adequate cross-dataset generalizability. We envision our multi-year datasets can support the ML community in developing generalizable longitudinal behavior modeling algorithms.
translated by 谷歌翻译
颞叶(MTL)是一个包含海马和附近区域的大脑区域,被认为是哺乳动物中的体验构造系统,支持暂时扩展的事件序列的回忆和想象。此类功能也是AI研究领域中许多最近提出的``世界模型''的核心。从这种联系中汲取灵感,我们提出了一种新颖的变体,即双流世界模型(DSWM),该模型从高维观察和高维观察和学习中学习将它们分离为上下文和内容流。DSWM仅在一次曝光之后就可以在新颖的2D环境中可靠地产生想象中的轨迹,超过了标准的世界模型。DSWM还学习了潜在表示,这与在Hippocampus中建立的细胞非常相似。我们显示。该表示形式可作为强化学习基础功能,并且可以使用生成模型来帮助使用类似DYNA的更新来帮助策略学习过程。
translated by 谷歌翻译
我们提出了分支机构 - 培训 - 合并(BTM),这是一种用于对大型语言模型(LLMS)平行训练的沟通效率算法。我们表明,有可能在不同的数据子集上独立训练新的LLMS的子部分,从而消除了训练LLMS当前所需的大量多节点同步。 BTM学习了一组独立的专家LMS(ELMS),每个LMS(ELMS)专门针对不同的文本领域,例如科学或法律文本。可以添加和删除这些榆树以更新数据覆盖范围,并结合概括为新域,或者平均折叠回到单个LM以进行有效推理。通过从当前集合中的(混合物)分支,进一步训练新域的数据参数,然后将结果模型归还到该集合以备将来使用,从而学习新的榆树。实验表明,在控制训练成本时,与GPT型变压器LMS相比,BTM改善了与GPT风格的变压器LMS相比,可以改善内部和外部困惑。通过广泛的分析,我们表明这些结果对不同的ELM初始化方案是可靠的,但需要专家领域的专业化。具有随机数据拆分的LM合奏表现不佳。我们还提出了将BTM缩放到64个领域的新语料库(总计192B居民分开的代币)的研究;所得的LM(22.4B总参数)以及经过2.5倍计算训练的变压器LM。这些收益随域的数量增长,表明可以使用更具侵略性的并行性来有效地在未来的工作中训练更大的模型。
translated by 谷歌翻译
机器学习(ML)研究通常集中在模型上,而最突出的数据集已用于日常的ML任务,而不考虑这些数据集对基本问题的广度,困难和忠诚。忽略数据集的基本重要性已引起了重大问题,该问题涉及现实世界中的数据级联以及数据集驱动标准的模型质量饱和,并阻碍了研究的增长。为了解决此问题,我们提出Dataperf,这是用于评估ML数据集和数据集工作算法的基准软件包。我们打算启用“数据棘轮”,其中培训集将有助于评估相同问题的测试集,反之亦然。这种反馈驱动的策略将产生一个良性的循环,该循环将加速以数据为中心的AI。MLCommons协会将维护Dataperf。
translated by 谷歌翻译
尽管电子健康记录是生物医学研究的丰富数据来源,但这些系统并未在医疗环境中统一地实施,并且由于医疗保健碎片化和孤立的电子健康记录之间缺乏互操作性,可能缺少大量数据。考虑到缺少数据的案例的删除可能会在随后的分析中引起严重的偏见,因此,一些作者更喜欢采用多重插补策略来恢复缺失的信息。不幸的是,尽管几项文献作品已经通过使用现在可以自由研究的任何不同的多个归档算法记录了有希望的结果,但尚无共识,MI算法效果最好。除了选择MI策略之外,归纳算法及其应用程序设置的选择也至关重要且具有挑战性。在本文中,受鲁宾和范布伦的开创性作品的启发,我们提出了一个方法学框架,可以应用于评估和比较多种多个插补技术,旨在选择用于计算临床研究工作中最有效的推断。我们的框架已被应用于验证和扩展较大的队列,这是我们在先前的文献研究中提出的结果,我们在其中评估了关键患者的描述符和Covid-19的影响在2型糖尿病患者中的影响,其数据为2型糖尿病,其数据为2型糖尿病由国家共同队列合作飞地提供。
translated by 谷歌翻译
Deep learning (DL) is gaining popularity as a parameter estimation method for quantitative MRI. A range of competing implementations have been proposed, relying on either supervised or self-supervised learning. Self-supervised approaches, sometimes referred to as unsupervised, have been loosely based on auto-encoders, whereas supervised methods have, to date, been trained on groundtruth labels. These two learning paradigms have been shown to have distinct strengths. Notably, self-supervised approaches have offered lower-bias parameter estimates than their supervised alternatives. This result is counterintuitive - incorporating prior knowledge with supervised labels should, in theory, lead to improved accuracy. In this work, we show that this apparent limitation of supervised approaches stems from the naive choice of groundtruth training labels. By training on labels which are deliberately not groundtruth, we show that the low-bias parameter estimation previously associated with self-supervised methods can be replicated - and improved on - within a supervised learning framework. This approach sets the stage for a single, unifying, deep learning parameter estimation framework, based on supervised learning, where trade-offs between bias and variance are made by careful adjustment of training label.
translated by 谷歌翻译
人类无法访问许多空间,机器人可以帮助传感器和设备提供。这些空间中有许多包含三维通道和不均匀的地形,这些通道对机器人设计和控制构成了挑战。通过同时进行的远处和体材料反转移动的环形机器人有望在这些类型的空间中导航。我们提出了一种新型的柔软的环形机器人,该机器人在充满空气的膜内使用电动设备推动自己推动自己。我们的机器人只需要一个控制信号即可移动,可以符合其环境,并且可以垂直爬上电动机扭矩,该电动机与用来支撑机器人对环境的力无关。我们得出并验证了其运动所涉及的力的模型,并演示了机器人导航迷宫和攀登管道的能力。
translated by 谷歌翻译